Automatic Speech Recognition Using Limited Vocabulary: A Survey
نویسندگان
چکیده
Automatic Speech Recognition (ASR) is an active field of research due to its large number applications and the proliferation interfaces or computing devices that can support speech processing. However, bulk are based on well-resourced languages overshadow under-resourced ones. Yet, ASR represents undeniable means promote such languages, especially when designing human-to-human human-to-machine systems involving illiterate people. An approach design system targeting start with a limited vocabulary. using vocabulary subset recognition problem focuses small words sentences. This paper aims provide comprehensive view mechanisms behind as well techniques, tools, projects, recent contributions, possible future directions in work consequently provides way forward Although emphasis put vocabulary, most tools techniques reported this survey be applied general.AbbreviationsACC: Accuracy; AM: Acoustic Model; ASR: Recognition; BD-4SK-ASR: Basic Dataset for Sorani Kurdish CER: Character Error Rate; CMU: Carnegie Mellon University; CNN: Convolutional Neural Network; CNTK: CogNitive ToolKit; CUED: Cambridge University Engineering Department; DCT:Discrete Cosine Transformation; DL: Deep Learning; DNN: DRL: Reinforcement DWT: Discrete Wavelet Transform; FFT: Fast Fourier GMM: Gaussian Mixture HMM: Hidden Markov HTK: Model JASPER: Just Another Recognizer; LDA: Linear Discriminant Analysis; LER: Letter LGB: Light Gradient Boosting Machine; LM:Language LPC: Predictive Coding; LVCSR: Large Vocabulary Continuous LVQ: Learning Vector Quantization Algorithm; MFCC: Mel-Frequency Cepstrum Coefficient; ML: Machine PCM:Pulse-Code Modulation; PPVT: Peabody Picture Test; RASTA: RelAtive SpecTral; RLAT: Rapid Language Adaptation Toolkit; S2ST: Speech-to-Speech Translation; SAPI: Application Programming Interface; SDK: Software Development Kit; SVASR:Small WER: Word Rate
منابع مشابه
Large vocabulary automatic speech recognition for children
Recently, Google launched YouTube Kids, a mobile application for children, that uses a speech recognizer built specifically for recognizing children’s speech. In this paper we present techniques we explored to build such a system. We describe the use of a neural network classifier to identify matched acoustic training data, filtering data for language modeling to reduce the chance of producing ...
متن کاملAutomatic language identification using large vocabulary continuous speech recognition
We have developed a highly accurate automatic language identification system based on large vocabulary continuous speech recognition (LVCSR). Each test utterance is recognized in a number of languages, and the language ID decision is based on the probability of the output word sequence reported by each recognizer. Recognizers were implemented for this test in English, Japanese, and Spanish, usi...
متن کاملCroatian Large Vocabulary Automatic Speech Recognition
This paper presents procedures used for development of a Croatian large vocabulary automatic speech recognition system (LVASR). The proposed acoustic model is based on context-dependent triphone hidden Markov models and Croatian phonetic rules. Different acoustic and language models, developed using a large collection of Croatian speech, are discussed and compared. The paper proposes the best f...
متن کاملSVMs for Automatic Speech Recognition: A Survey
Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high-performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANN...
متن کاملVocabulary Independent Speech Recognition Using Particles
A method is presented for performing speech recognition that is not dependent on a fixed word vocabulary. Particles are used as the recognition units in a speech recognition system which permits word-vocabulary independent speech decoding. A particle represents a concatenated phone sequence. Each string of particles that represents a word in the one-best hypothesis from the particle speech reco...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied Artificial Intelligence
سال: 2022
ISSN: ['0883-9514', '1087-6545']
DOI: https://doi.org/10.1080/08839514.2022.2095039